Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Device Information Spec #287

Merged
merged 7 commits into from
Dec 2, 2020

Conversation

amorenoz
Copy link
Contributor

@amorenoz amorenoz commented Nov 19, 2020

As per the device-info specification, save the device information in a file using github.com/k8snetworkplumbingwg/network-attachment-definition-client
utility functions.

As per the design decisions made: Added the create/clean functions to the ResourcePool Interface and implement them in the netResourcePool so that the entire device information can be accessed

Error handling:

  • Startup: Do not stop the server creation if the DeviceInfo file creation fails, but log it at Error level
  • Cleanup: Do not stop cleaning if one DeviceInfo file deletion fails, try to clean them all and aggregate errors.

For now, only add the mandatory fields.

Fixes: #286

@amorenoz
Copy link
Contributor Author

cc @Billy99

@amorenoz amorenoz changed the title Add support for Device Information Spec WIP: Add support for Device Information Spec Nov 19, 2020
@amorenoz
Copy link
Contributor Author

With this change, having the following pools:

$ kubectl get node $HOSTNAME -o json | jq .status.allocatable
{
  "cpu": "8",
  "ephemeral-storage": "3570039515370",
  "hugepages-1Gi": "0",
  "hugepages-2Mi": "16Gi",
  "intel.com/intelVFs": "3",
  "intel.com/mlxPFs": "1",
  "intel.com/mlxVFs": "4",
  "memory": "15908252Ki",
  "pods": "110"
}

I get the following files:

$ tree /var/run/k8s.cni.cncf.io/devinfo/dp
/var/run/k8s.cni.cncf.io/devinfo/dp
├── intel.com-intelVFs-0000:05:00.2-device.json
├── intel.com-intelVFs-0000:05:00.3-device.json
├── intel.com-intelVFs-0000:05:00.4-device.json
├── intel.com-mlxPFs-0000:04:00.1-device.json
├── intel.com-mlxVFs-0000:04:00.3-device.json
├── intel.com-mlxVFs-0000:04:00.4-device.json
├── intel.com-mlxVFs-0000:04:00.5-device.json
└── intel.com-mlxVFs-0000:04:00.6-device.json
$ cat /var/run/k8s.cni.cncf.io/devinfo/dp/intel.com-mlxVFs-0000\:04\:00.3-device.json 
{"type":"pci","version":"0.1.0","pci":{"pci-address":"0000:04:00.3"}}
$ cat /var/run/k8s.cni.cncf.io/devinfo/dp/intel.com-intelVFs-0000\:05\:00.3-device.json 
{"type":"pci","version":"0.1.0","pci":{"pci-address":"0000:05:00.3"}}

Copy link

@Billy99 Billy99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

None of the comments are real issues, just nits. I tested with the version just before the PR was created and worked fine with pending device-info Multus PR. The PR comment indicated that only the mandatory fields were filled in. Is that the plan for the final PR or just because this is still [WIP] (just wondering, not pressuring to fill in the rest)?

go.mod Outdated Show resolved Hide resolved
pkg/netdevice/netResourcePool.go Outdated Show resolved Hide resolved
pkg/netdevice/netResourcePool.go Outdated Show resolved Hide resolved
@amorenoz
Copy link
Contributor Author

/lgtm

None of the comments are real issues, just nits. I tested with the version just before the PR was created and worked fine with pending device-info Multus PR. The PR comment indicated that only the mandatory fields were filled in. Is that the plan for the final PR or just because this is still [WIP] (just wondering, not pressuring to fill in the rest)?

I was thinking in pushing the mandatory fields first (pci-address) mainly because how the rest of the device info is handled is being discussed in #281

@amorenoz amorenoz force-pushed the rfe/npwg_devinfo branch 3 times, most recently from 3db1cc4 to eefbb48 Compare November 23, 2020 12:33
Copy link
Collaborator

@zshi-redhat zshi-redhat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like CI failed, is it because due to not using 1.15 golang version?

pkg/netdevice/netResourcePool.go Outdated Show resolved Hide resolved
@amorenoz
Copy link
Contributor Author

looks like CI failed, is it because due to not using 1.15 golang version?

I'm not sure why it's failing. I'm looking into it but it's taking some time since I get errors in Travis that I cannot reproduce locally.

@amorenoz amorenoz changed the title WIP: Add support for Device Information Spec Add support for Device Information Spec Nov 25, 2020
@amorenoz
Copy link
Contributor Author

looks like CI failed, is it because due to not using 1.15 golang version?

I'm not sure why it's failing. I'm looking into it but it's taking some time since I get errors in Travis that I cannot reproduce locally.

@zshi-redhat, the CI instability issues I was facing are now solved. PTAL

Copy link
Member

@martinkennelly martinkennelly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/LGTM. Performed as expected.

volumes:
- name: devicesock
hostPath:
path: /var/lib/kubelet/
- name: log
hostPath:
path: /var/log
- name: device-info
hostPath:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need to specify
DirectoryOrCreate type ?

what if the path does not exist on the host ? will it be created ?

https://kubernetes.io/docs/concepts/storage/volumes/#hostpath

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not see any error when I tested it. The nad library would take care of creating the directory so from a DevicePlugin perspective, it should not care. However, it's not a bad idea to add that type flag.
Thanks

Save the deviceInfo files on server startup and remove them on server
clean up.

Signed-off-by: Adrian Moreno <[email protected]>
Signed-off-by: Adrian Moreno <[email protected]>
Signed-off-by: Adrian Moreno <[email protected]>
Move the creation of the mocked resourcePool out of BeforeEach because
it is used in a goroutine (the one that calls server.Stop()) and so
having a unique reference would cause race-conditions.

Signed-off-by: Adrian Moreno <[email protected]>
Copy link
Contributor

@adrianchiris adrianchiris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO changes are good. I also tested the change locally to get a better feel for the new behaviour.

Are you planning to add unit test for {Store/Clean}DeviceInfoFile` ?

nadutils can be put behind an interface to allow mocking

p.s
regarding my last comment indeed it seems to work when DirectoryOrCreate is not specified.
but no harm in keeping it IMO.

@amorenoz
Copy link
Contributor Author

IMO changes are good. I also tested the change locally to get a better feel for the new behaviour.

Are you planning to add unit test for {Store/Clean}DeviceInfoFile` ?

The current unit test verifies that Store/CleanDeviceInfoFile are called but not what it does internally

nadutils can be put behind an interface to allow mocking

Good idea, I did not think of that option. So we would have to:

  • Add an interface in types.go with the functions from 'nadutils' we use
  • Implement it (for example) in pkg/utils/utils.go
  • Add a function to the factory to create the implementation
  • Make the resourceManager pass the factory to server.Start() and server.Stop() (boy, we need to make this factory a proper singleton!)

Are we good with this approach?

@amorenoz
Copy link
Contributor Author

@martinkennelly, @adrianchiris, Talking to @zshi-redhat, we considered an alternative approach in order to make the unit tests cleaner: make resourceFactory a global singleton. It's anyway referenced all over the place and it makes little sense to carry it's pointer around.
However, that is a big change that would likely miss the upcoming release.
So the strategy would be to merge this PR as is, make the release, and then I'll work on making the resourceFactory global. Once that is done, I'll add Unit tests for the Store/CleanDeviceInfoFile

What do you think?

@adrianchiris
Copy link
Contributor

adrianchiris commented Nov 29, 2020

Good idea, I did not think of that option. So we would have to:

  • Add an interface in types.go with the functions from 'nadutils' we use
  • Implement it (for example) in pkg/utils/utils.go
  • Add a function to the factory to create the implementation
  • Make the resourceManager pass the factory to server.Start() and server.Stop() (boy, we need to make this factory a proper >singleton!)

NetResourcePool would either need the factory or an implementation to the "nadutils" interface. i dont think you need to pass it to server.Start(), server.Stop().

Are we good with this approach?

Generally Yes from my POV, See my comments below on the proper singleton bit.

@martinkennelly, @adrianchiris, Talking to @zshi-redhat, we considered an alternative approach in order to make the unit tests cleaner: make resourceFactory a global singleton. It's anyway referenced all over the place and it makes little sense to carry it's pointer around.
However, that is a big change that would likely miss the upcoming release.
So the strategy would be to merge this PR as is, make the release, and then I'll work on making the resourceFactory global. Once that is done, I'll add Unit tests for the Store/CleanDeviceInfoFile

resourceFactory is a signleton today, its New method returns the same instance.
having it "globally" available would require alot of changes in unit-test to swap its global implementation with a mocked one.
as well as code refactoring of the various packages. The benefits of doing so is not apparent to me (at least not in a substantial manner).

I think having resourceFactory instance as a member of other constructs follows the pattern of dependency injection, to help with unit tests and ease refactoring/replacing existing implementations of different interfaces. what you propose kinda deviates from that path.

I think the refactor for resourceFactory should be further discussed to reach an agreement, if you could open an issue on it to trigger discussion it would be great ! :)

If you think adding unit-test in the current state of code requires alot of code changes im fine with having it as a separate PR.
(and an issue to track that ofc)

@amorenoz
Copy link
Contributor Author

Good idea, I did not think of that option. So we would have to:

  • Add an interface in types.go with the functions from 'nadutils' we use
  • Implement it (for example) in pkg/utils/utils.go
  • Add a function to the factory to create the implementation
  • Make the resourceManager pass the factory to server.Start() and server.Stop() (boy, we need to make this factory a proper >singleton!)

NetResourcePool would either need the factory or an implementation to the "nadutils" interface. i dont think you need to pass it to server.Start(), server.Stop().

Are we good with this approach?

Generally Yes from my POV, See my comments below on the proper singleton bit.

@martinkennelly, @adrianchiris, Talking to @zshi-redhat, we considered an alternative approach in order to make the unit tests cleaner: make resourceFactory a global singleton. It's anyway referenced all over the place and it makes little sense to carry it's pointer around.
However, that is a big change that would likely miss the upcoming release.
So the strategy would be to merge this PR as is, make the release, and then I'll work on making the resourceFactory global. Once that is done, I'll add Unit tests for the Store/CleanDeviceInfoFile

resourceFactory is a signleton today, its New method returns the same instance.

I don't think that's really the case (although the intention does seem that one):

func NewResourceFactory(prefix, suffix string, pluginWatch bool) types.ResourceFactory {
if instance == nil {
return &resourceFactory{
endPointPrefix: prefix,
endPointSuffix: suffix,
pluginWatch: pluginWatch,
}
}
return instance
}

having it "globally" available would require alot of changes in unit-test to swap its global implementation with a mocked one.
as well as code refactoring of the various packages. The benefits of doing so is not apparent to me (at least not in a substantial manner).

I think having resourceFactory instance as a member of other constructs follows the pattern of dependency injection, to help with unit tests and ease refactoring/replacing existing implementations of different interfaces. what you propose kinda deviates from that path.

I think the refactor for resourceFactory should be further discussed to reach an agreement, if you could open an issue on it to trigger discussion it would be great ! :)

#300 created

If you think adding unit-test in the current state of code requires alot of code changes im fine with having it as a separate PR.
(and an issue to track that ofc)

I don't think it's a a lot of code, it just requires modifying the resourcePool struct. If that's acceptable I'd go ahead with it.

@adrianchiris
Copy link
Contributor

I don't think that's really the case (although the intention does seem that one):

you are right ! there is no assignment to instance although i expected to be one.

#300 created

Thanks!

I don't think it's a a lot of code, it just requires modifying the resourcePool struct. If that's acceptable I'd go ahead with it.

IMO its a reasonable change to facilitate UT of the added methods

@amorenoz
Copy link
Contributor Author

@adrianchiris PTAL at the last 3 commits (the rest of the PR unchanged)

@zshi-redhat
Copy link
Collaborator

CI failed because pkg/types/mocks/NadUtils.go file is not generated.

pkg/netdevice/netResourcePool_test.go Outdated Show resolved Hide resolved
pkg/netdevice/netResourcePool_test.go Outdated Show resolved Hide resolved
pkg/netdevice/nadutils.go Outdated Show resolved Hide resolved
Copy link
Contributor

@adrianchiris adrianchiris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @amorenoz !!

LGTM

Copy link
Member

@martinkennelly martinkennelly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the rework / noticing the singleton issue.

pkg/types/types.go Outdated Show resolved Hide resolved
In order to enable mocking the functions form
k8snetworkplumbingwg/network-attachment-definition-client, required for
unit testing, create a nadutils wrapper interface and the
implementation.

Use the resourceFactory to create instances of the implementation that
lives in netdevice.

Signed-off-by: Adrian Moreno <[email protected]>
In order to be able to unit test functions that use nadutils, use the
newly created interface in the netResourcePool.

Signed-off-by: Adrian Moreno <[email protected]>
Two unit tests are added:
netResourcePool_test verifies that the appropriate nadutils functions
are called with the correct arguments on
netResourcePool.{Store,Clean}DeviceInfoFile()

server_test.go verifies that the
resourcePool.{Store,Clean}DeviceInfoFile() functions are actually called
when the server starts and stops.

Signed-off-by: Adrian Moreno <[email protected]>
@zshi-redhat zshi-redhat merged commit 1e6a656 into k8snetworkplumbingwg:master Dec 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Device Info Spec support
5 participants